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Peer to peer lending is famous for easy and fast loans from complicated 
traditional lending institutions. Therefore, big data and machine learning are 
needed for credit risk analysis, especially for potential defaulters. However, 
data imbalance and high computation have a terrible effect on machine 
learning prediction performance. This paper proposes a stacking ensemble 


learning with features selection based on embedded techniques (gradient 

boosted trees (GBDT), random forest (RF), adaptive boosting (AdaBoost), 

Keywords: extra gradient boosting (XGBoost), light gradient boosting machine 
r (LGBM), and decision tree (DT)) to predict the credit risk of individual 
Credit risk b ; h ; 
. orrowers on peer to peer (P2P) lending. The stacking ensemble model is 

Embedded technique created from a stack of meta-learners used in feature selection. The feature 
Feature selection selection+ stacking model produces an average of 94.54% accuracy and 

Peer to peer lending 69.10 s execution time. RF meta-learner+Stacking ensemble is the best 
Stacking ensemble model classification model, and the LGBM meta-learner+stacking ensemble is the 
fastest execution time. Based on experimental results, this paper showed that 
the credit risk prediction for P2P lending could be improved using the 

stacking ensemble model in addition to proper feature selection. 
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1. INTRODUCTION 

Peer to peer (P2P) lending platform as a modern banking system launched to overcome the 
complexity of conventional loans with the concept of borrowing and borrowing money directly without 
intermediaries. The growth of P2P has been very rapid since Lehman introduced it in 2008-2009. The 
complexity of loan transactions is the reason for lending more to P2P. Moreover, traditional financial 
institutions do not fully cater to risk-seeking lenders and high-risk borrowers [1]. Therefore, P2P lending 
platforms are becoming increasingly popular where lenders have more flexibility to pick and choose the 
desired risk portfolio [2]. 

Traditional lending institutions play at a low-risk level. Therefore, P2P lending is here to bridge this 
problem by offering easy lending for small businesses or beginners. Low-interest rates and transaction 
flexibility are the main attractions of P2P lending. Despite this attraction, P2P lending has not been able to 
ensure that borrowers are in a hurry to be given a loan. Generally, financial institutions use scorecards that 
contain statistical information on prospective borrowers for credit risk analysis [3], [4]. Recently, the 
application of machine learning carried out to predict the credit risk of online loans with a very encouraging 
performance [5]—[9]. Moreover, the number of features and balanced data (the number of borrowers paid on 
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time is almost the same as the number of default borrowers) affects the prediction results so that online 
lending institutions and borrowers can use this opportunity. 

Most of the available P2P lending data is unbalanced (some people lost to follow-up, with most 
borrowers returning on time). If this data is used in machine learning training, the borrower will be classified 
as a good or no-risk borrower. Of course, machine learning prediction accuracy is very high. However, high 
accuracy does not guarantee the exact model used. Unbalanced data has the potential to make a habit of 
predicting default or non-default borrowers [10], [11]. In such circumstances, the model is often fused with 
over-trained models biased to the dominant classes of the available data. Therefore, how to achieve accurate 
predictions for bad borrowers is very important. In addition, compared to traditional banking systems, P2P 
lending does not have sufficient information about financial statistics and historical customer data. In 
addition, the model should be computationally lightweight. Therefore, finding the essential features to reduce 
the cost of computing becomes a more pressing issue. Fewer features improve classification accuracy and 
generalization if appropriately chosen [12], [13]. 

The imbalance of multi-class data types (the number of samples from one or several classes is 
greater than the other) becomes a challenge for the prediction process. This data imbalance can potentially 
reduce the model's prediction performance [14]. Therefore, several studies carried out the pre-processing data 
stage to make the data balanced [15]. The approach used to reduce the dimensions of P2P lending data is by 
selecting features [16]. The working concept of this approach is to choose features that are considered 
important and remove features that are not important in the prediction process. Removing non-essential 
attributes has several advantages, such as reducing memory and computational costs, taking full advantage of 
precision, and staying clear of over-fitting problems during the training stage [17]. On the other hand, some 
features might serve for algorithms (e.g., decision tree (DT)). May not be practical for various other models, 
such as regression models. In addition, irrelevant features can negatively affect model performance. 
Therefore, data pre-processing and feature selection are the most significant steps in designing and selecting 
the best model for a particular problem [18]. 

Well-predictive performance can be achieved through a feature selection approach [19]. Wrapper, 
embedded, shuffle, and hybrid are types of this approach. The main objective of this research is to improve 
the model's performance, avoid over-fitting problems, and reduce the dimensions of the input data. Although 
feature selection has certain drawbacks, it is an important pre-processing technique for ML. It generates 
additional information and provides an intuitive understanding of typical patterns before the proposed 
classifier is used [20]. 

This study uses the embedded technique, i.e., random forest (RF) importance [21]-[25] and boosting 
(gradient boosted trees (GBDT), extra trees (ET), adaptive boosting (AdaBoost), extra gradient boosting 
(XGBoost), light gradient boosting machine (LGBM), and decision tree (DT)) importance [26]-[30] for 
feature selection to improve credit risk identification in P2P lending. This study proposes a Stacking 
ensemble approach from several machine learning techniques: GBDT, RF, AdaBoost, XGBoost, LGBM, and 
DT compares their performance. The most commonly used matrix is accuracy to assess the proposed 
implementation. This paper proposes two contributions: i) selection of features based on embedding technique 
and stacking ensemble learning model as a classifier and ii) feature selection using embedding technique. 


2. METHOD 

The P2P solid classification-based ML ensemble model is described in this section. In general, the 
proposed model is used for the training and testing process of the collected data. Meanwhile, k-fold cross- 
validation (CV) is used to overcome overfitting in training by setting the average performance classifier. The 
basic idea of this tool is iteratively repeating 3 times and testing on the fifth iteration. In this study, we apply 
feature selection to select essential features in credit risk prediction to improve prediction performance. For 
feature selection, we tested embedded techniques involving GBDT, RF, AdaBoost, XGBoost, LGBM, and 
DT. The workflow of the proposed framework is presented in Figure 1. 


2.1. Data collection and pre-processing 

The original data set was collected from the Lending Club website, one of the most popular P2P 
platforms in the US. The raw data period is from the first quarter of 2019 to the fourth quarter of 2019, 
containing 42,538 borrowers with 161 features. Initial exploration of the dataset revealed many columns 
having missing values of more than 68.51%, thus removed. As a result, the dataset's features reduce from 
127, the number of features to 34. The predictive model may become too complex. Thus, references to the 
most recent literature are referenced to delimit the feature space further. 
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2.2. Feature selection 

Personal component analysis (PCA) is used to reduce the dimensions of the data in feature selection. 
The low-dimensional features that are mapped have no significant effect. PCA cannot distinguish the 


importance of features in the classification process. Especially in deciding to give or not get a loan, so using 
essential feature selection techniques is needed. 
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Figure 1. Process steps for implementing feature selection methods and stacking ensemble learning model 


Another approach is feature selection. One approach to feature selection is the embedded technique. 
Embedded techniques complete the machine learning algorithm construction's feature selection process. In 
other words, they perform feature selection during model training, which is why we call them embedded 
methods. A learning algorithm takes advantage of its variable selection process and simultaneously performs 
feature selection and classification or regression. All embedded methods work: first, they train machine 
learning models. They then derived the important features of this model, which measures how important the 
features are when making predictions. Finally, they remove unimportant features using important child 
features. 


2.3. Stacking ensemble model 

Futhermore Martin et al. [31] introduced the stacking method as an ensemble algorithm distinct 
from bagging, RF, and boosting: stacking considers heterogeneous learners. The schematic diagram of the 
stacking method is shown in Figure 2. There are usually two or more levels of the classifier. The first level is 
zero and contains basic classifiers that take the original input. As seen in Figure 2, HO is the original dataset, 
which is the P2P lending dataset in our problem. The zero-level classifier will generate the H1 dataset, which 
will be used in the second level by the meta classifier (or first-level classifier). H1 is the dataset generated by 
the base classifiers: GBDT, RF, AdaBoost, XGBoost, LGBM, and DT. CBi is the primary classifier will be 
used to generate the H1 dataset. CMi is the meta-classifier that will be used to classify the H1 dataset. H1 can 
be a probability or a label, meaning the output from CBi that CMi will use. We will compare the two 
methods. 
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HO: Dataset 


H1: Dataset 


Figure 2. A schematic diagram of the stacking ensemble model 


3. RESULTS AND DISCUSSION 

The experimental results will be discussed in the following section. All experiments were carried 
out using a known P2P lending binary dataset. In order to measure the performance of the ML model, 3-fold 
cross-validation (3-fold CV) was used to calculate mean accuracy. Determining the best features uses the 
feature importance algorithm from embedded techniques involving GBDT, RF, AdaBoost, XGBoost, LGBM, 
and DT. Important features are chosen based on the weight value of each feature generated during the 
predictive analysis process. Details of the best feature selection results are shown in Figure 3 (in appendix). 

Figure 3 shows that each meta-learner in the embedded technique produces various important 
features based on the weight value of each feature so that the use of meta-learners dramatically affects the 
accuracy score generated by the evaluation model. This study used a stacking ensemble learning model 
created from a stack of meta-learners used in feature selection. The results of comparing the accuracy scores 
and execution time of the stacking ensemble model based on the type of meta-learners used in the feature 
selection process shown in Table 1. 

Based on Table 1, the stacking ensemble model produces an average of 94.54% accuracy and 69.10 
execution time. RF meta-learner+stacking ensemble is the best classification model, and the LGBM meta- 
learner+stacking ensemble has the fastest execution time. Meanwhile, when compared to the prediction 
accuracy of the feature selectiont+stacking model with the original model, the feature selection+stacking 
model succeeded in increasing the accuracy of the original model with an average difference from the 
accuracy of the original model to the feature selection model+stacking model reaching 1.22%. DT is the best 
meta-learner for increased accuracy on the original model. 

Furthermore, the feature selection+stacking model is not efficient on execution time. The original 
model requires a more efficient execution time than the feature selection+stacking model. However, these 
limitations do not significantly affect feature selection because the time difference between the original 
model and the feature selectiont+model stacking is not too far away and is still acceptable in the 
computational process. In detail, the comparison of the feature selection+stacking model and the original 
model can be seen in Table 1. 


Table 1. The comparison of the feature selectiont+stacking model and the original model 


Model Accuracy (%) Time execution (s) 
FS+Stacking model Original model FS+Stacking model Original model 

GBDT 92.5 91.95 70.68 14.28 
AdaBoost 92.5 90.05 68.9 3.43 
XGB 92.32 92.16 68.94 3.87 

LGBM 92.46 92.45 66.41 0.7 
DT 92.31 88.5 69.61 0.95 
RF 92.54 92.16 70.11 8.74 
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4. CONCLUSION 

This paper discusses the challenges of standard P2P lending data sets, such as high dimensions, 
small sample sizes, and unbalanced class labels. A feature selection technique based on the embedded 
technique is introduced. The most important features of the P2P lending data set were extracted within the 
framework with GBDT, RF, AdaBoost, XGBoost, LGBM, and DT. The result of feature selection is that 
each meta-learner in the embedded technique produces various important features based on the weight value 
of each feature so that the use of meta-learners greatly affects the accuracy score generated by the evaluation 
model. The stacking ensemble model produces an average of 94.54% accuracy and 69.10 s execution time. 
RF meta-learner+stacking ensemble is the best classification model, and LGBM meta- learner+stacking 
ensemble has the fastest execution time. 
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Figure 3. The features importance based on embedded technique 
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